9 research outputs found

    The relationship between first language acquisition and dialect variation:Linking resources from distinct disciplines in a CLARIN-NL project

    Get PDF
    AbstractIt is remarkable that first language acquisition and historical dialectology should have remained strange bedfellows for so long considering the common assumption in historical linguistics that language change is due to the process of non-target transmission of linguistic features, forms and structures between generations, and thus between parents or adults and children. Both disciplines have remained isolated from each other due to, among other things, different research questions, methods of data-collection and types of empirical resources. The aim of this paper is to demonstrate that the common assumption in historical linguistics mentioned above can be examined with the help of Digital Humanities projects like CLARIN. CLARIN infrastructure makes it possible to carry out e-Humanities type research by combining datasets from distinct disciplines through tools for data processing. The outcome of the CLARIN-NL COAVA-project (acronym of: Cognition, Acquisition and Variation tool) allows researchers to access two datasets from two different sub disciplines simultaneously, namely Dutch first child language acquisition files located in Childes (MacWhinney, 2000) and historical Dutch Dialect Dictionaries through the development of a tool for easy exploration of nouns

    Exploring XML-based Technologies and Procedures for Quality

    No full text
    The use of Extensible Markup Language (XML) for the annotation of Spoken Language Resources (SLR) is becoming increasingly common these days. Therefore the Speech Processing EXpertise centre (SPEX), which is the SLR validation centre of the European Language Resources Association (ELRA), is also being confronted more with XML. The project "Lexica and Corpora for Speech-to-Speech Translation Components" (LC-STAR) is a project that uses XML for the encoding of its resources. For SPEX, XMLbased annotations are still relatively new data formats, which is why at SPEX XML-based quality evaluation (validation) technologies and procedures are being explored. This is done using the XML encoded phonetic lexica developed in the LCSTAR project as a test bed

    SLR Validation: Current Trends and Developments

    No full text
    This paper deals with the quality evaluation (validation) of Spoken Language Resources (SLR). The current situation in terms of relevant validation criteria and procedures is briefly presented. Next, a number of validation issues related to new data formats (XMLbased annotations, UTF-16 encoding) are discussed. Further, new validation cycles that were introduced in a series of new projects like SpeeCon and OrienTel are addressed: prompt sheet validation, lexicon validation and pre-release validation. Finally, SPEX's current and future activities as ELRA's validation centre for SLR are outlined

    Creating & Testing CLARIN Metadata Components

    No full text

    A Unified Structure for Dutch Dialect Dictionary Data

    No full text
    The traditional dialect vocabulary of the Netherlands and Flanders is recorded and researched in several Dutch and Belgian research institutes and universities. Most of these distributed dictionary creation and research projects collaborate in the "Permanent Overlegorgaan Regionale Woordenboeken" (ReWo). In the project "digital databases and digital tools for WBD and WLD" (D-square) the dialect data published by two of these dictionary projects (Woordenboek van de Brabantse Dialecten and Woordenboek van de Limburgse Dialecten) is being digitised. One of the additional goals of the D-square project is the development of an infrastructure for electronic access to all dialect dictionaries collaborating in the ReWo. In this paper we will firstly reconsider the nature of the core data types - form, sense and location - present in the different dialect dictionaries and the ways these data types are further classified. Next we will focus on the problems encountered when trying to unify this dictionary data and their classifications and suggest solutions. Finally we will look at several implementation issues regarding a specific encoding for the dictionaries
    corecore